435 research outputs found

    REAPR: a universal tool for genome assembly evaluation.

    Get PDF
    Methods to reliably assess the accuracy of genome sequence data are lacking. Currently completeness is only described qualitatively and mis-assemblies are overlooked. Here we present REAPR, a tool that precisely identifies errors in genome assemblies without the need for a reference sequence. We have validated REAPR on complete genomes or de novo assemblies from bacteria, malaria and Caenorhabditis elegans, and demonstrate that 86% and 82% of the human and mouse reference genomes are error-free, respectively. When applied to an ongoing genome project, REAPR provides corrected assembly statistics allowing the quantitative comparison of multiple assemblies. REAPR is available at http://www.sanger.ac.uk/resources/software/reapr/

    Nitroheterocyclic drug resistance mechanisms in <i>Trypanosoma brucei</i>

    Get PDF
    OBJECTIVES: The objective of this study was to identify the mechanisms of resistance to nifurtimox and fexinidazole in African trypanosomes. METHODS: Bloodstream-form Trypanosoma brucei were selected for resistance to nifurtimox and fexinidazole by stepwise exposure to increasing drug concentrations. Clones were subjected to WGS to identify putative resistance genes. Transgenic parasites modulating expression of genes of interest were generated and drug susceptibility phenotypes determined. RESULTS: Nifurtimox-resistant (NfxR) and fexinidazole-resistant (FxR) parasites shared reciprocal cross-resistance suggestive of a common mechanism of action. Previously, a type I nitroreductase (NTR) has been implicated in nitro drug activation. WGS of resistant clones revealed that NfxR parasites had lost >100 kb from one copy of chromosome 7, rendering them hemizygous for NTR as well as over 30 other genes. FxR parasites retained both copies of NTR, but lost >70 kb downstream of one NTR allele, decreasing NTR transcription by half. A single knockout line of NTR displayed 1.6- and 1.9-fold resistance to nifurtimox and fexinidazole, respectively. Since NfxR and FxR parasites are ∼6- and 20-fold resistant to nifurtimox and fexinidazole, respectively, additional factors must be involved. Overexpression and knockout studies ruled out a role for a putative oxidoreductase (Tb927.7.7410) and a hypothetical gene (Tb927.1.1050), previously identified in a genome-scale RNAi screen. CONCLUSIONS: NTR was confirmed as a key resistance determinant, either by loss of one gene copy or loss of gene expression. Further work is required to identify which of the many dozens of SNPs identified in the drug-resistant cell lines contribute to the overall resistance phenotype

    BamView: visualizing and interpretation of next-generation sequencing read alignments.

    Get PDF
    So-called next-generation sequencing (NGS) has provided the ability to sequence on a massive scale at low cost, enabling biologists to perform powerful experiments and gain insight into biological processes. BamView has been developed to visualize and analyse sequence reads from NGS platforms, which have been aligned to a reference sequence. It is a desktop application for browsing the aligned or mapped reads [Ruffalo, M, LaFramboise, T, Koyutürk, M. Comparative analysis of algorithms for next-generation sequencing read alignment. Bioinformatics 2011;27:2790-6] at different levels of magnification, from nucleotide level, where the base qualities can be seen, to genome or chromosome level where overall coverage is shown. To enable in-depth investigation of NGS data, various views are provided that can be configured to highlight interesting aspects of the data. Multiple read alignment files can be overlaid to compare results from different experiments, and filters can be applied to facilitate the interpretation of the aligned reads. As well as being a standalone application it can be used as an integrated part of the Artemis genome browser, BamView allows the user to study NGS data in the context of the sequence and annotation of the reference genome. Single nucleotide polymorphism (SNP) density and candidate SNP sites can be highlighted and investigated, and read-pair information can be used to discover large structural insertions and deletions. The application will also calculate simple analyses of the read mapping, including reporting the read counts and reads per kilobase per million mapped reads (RPKM) for genes selected by the user

    Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps

    Get PDF
    Advances in sequencing technology allow genomes to be sequenced at vastly decreased costs. However, the assembled data frequently are highly fragmented with many gaps. We present a practical approach that uses Illumina sequences to improve draft genome assemblies by aligning sequences against contig ends and performing local assemblies to produce gap-spanning contigs. The continuity of a draft genome can thus be substantially improved, often without the need to generate new data

    GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes

    Get PDF
    BACKGROUND: The function of a novel gene product is typically predicted by transitive assignment of annotation from similar sequences. We describe a novel method, GOtcha, for predicting gene product function by annotation with Gene Ontology (GO) terms. GOtcha predicts GO term associations with term-specific probability (P-score) measures of confidence. Term-specific probabilities are a novel feature of GOtcha and allow the identification of conflicts or uncertainty in annotation. RESULTS: The GOtcha method was applied to the recently sequenced genome for Plasmodium falciparum and six other genomes. GOtcha was compared quantitatively for retrieval of assigned GO terms against direct transitive assignment from the highest scoring annotated BLAST search hit (TOPBLAST). GOtcha exploits information deep into the 'twilight zone' of similarity search matches, making use of much information that is otherwise discarded by more simplistic approaches. At a P-score cutoff of 50%, GOtcha provided 60% better recovery of annotation terms and 20% higher selectivity than annotation with TOPBLAST at an E-value cutoff of 10(-4). CONCLUSIONS: The GOtcha method is a useful tool for genome annotators. It has identified both errors and omissions in the original Plasmodium falciparum annotation and is being adopted by many other genome sequencing projects

    A comprehensive evaluation of assembly scaffolding tools

    Get PDF
    Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity

    RATT: Rapid Annotation Transfer Tool

    Get PDF
    Second-generation sequencing technologies have made large-scale sequencing projects commonplace. However, making use of these datasets often requires gene function to be ascribed genome wide. Although tool development has kept pace with the changes in sequence production, for tasks such as mapping, de novo assembly or visualization, genome annotation remains a challenge. We have developed a method to rapidly provide accurate annotation for new genomes using previously annotated genomes as a reference. The method, implemented in a tool called RATT (Rapid Annotation Transfer Tool), transfers annotations from a high-quality reference to a new genome on the basis of conserved synteny. We demonstrate that a Mycobacterium tuberculosis genome or a single 2.5 Mb chromosome from a malaria parasite can be annotated in less than five minutes with only modest computational resources. RATT is available at http://ratt.sourceforge.net

    Progression of the canonical reference malaria parasite genome from 2002–2019

    Get PDF
    Here we describe the ways in which the sequence and annotation of the Plasmodium falciparum reference genome has changed since its publication in 2002. As the malaria species responsible for the most deaths worldwide, the richness of annotation and accuracy of the sequence are important resources for the P. falciparum research community as well as the basis for interpreting the genomes of subsequently sequenced species. At the time of publication in 2002 over 60% of predicted genes had unknown functions. As of March 2019, this number has been significantly decreased to 33%. The reduction is due to the inclusion of genes that were subsequently characterised experimentally and genes with significant similarity to others with known functions. In addition, the structural annotation of genes has been significantly refined; 27% of gene structures have been changed since 2002, comprising changes in exon-intron boundaries, addition or deletion of exons and the addition or deletion of genes. The sequence has also undergone significant improvements. In addition to the correction of a large number of single-base and insertion or deletion errors, a major miss-assembly between the subtelomeres of chromosome 7 and 8 has been corrected. As the number of sequenced isolates continues to grow rapidly, a single reference genome will not be an adequate basis for interpretating intra-species sequence diversity. We therefore describe in this publication a population reference genome of P. falciparum, called Pfref1. This reference will enable the community to map to regions that are not present in the current assembly. P. falciparum 3D7 will be continued to be maintained with ongoing curation ensuring continual improvements in annotation quality

    Transcriptome analysis of reproductive tissue and intrauterine developmental stages of the tsetse fly (Glossina morsitans morsitans)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tsetse flies, vectors of African trypanosomes, undergo viviparous reproduction (the deposition of live offspring). This reproductive strategy results in a large maternal investment and the deposition of a small number of progeny during a female's lifespan. The reproductive biology of tsetse has been studied on a physiological level; however the molecular analysis of tsetse reproduction requires deeper investigation. To build a foundation from which to base molecular studies of tsetse reproduction, a cDNA library was generated from female tsetse (<it>Glossina morsitans morsitans</it>) reproductive tissues and the intrauterine developmental stages. 3438 expressed sequence tags were sequenced and analyzed.</p> <p>Results</p> <p>Analysis of a nonredundant catalogue of 1391 contigs resulted in 520 predicted proteins. 475 of these proteins were full length. We predict that 412 of these represent cytoplasmic proteins while 57 are secreted. Comparison of these proteins with other tissue specific tsetse cDNA libraries (salivary gland, fat body/milk gland, and midgut) identified 51 that are unique to the reproductive/immature cDNA library. 11 unique proteins were homologus to uncharacterized putative proteins within the NR database suggesting the identification of novel genes associated with reproductive functions in other insects (hypothetical conserved). The analysis also yielded seven putative proteins without significant homology to sequences present in the public database (unknown genes). These proteins may represent unique functions associated with tsetse's viviparous reproductive cycle. RT-PCR analysis of hypothetical conserved and unknown contigs was performed to determine basic tissue and stage specificity of the expression of these genes.</p> <p>Conclusion</p> <p>This paper identifies 51 putative proteins specific to a tsetse reproductive/immature EST library. 11 of these proteins correspond to hypothetical conserved genes and 7 proteins are tsetse specific.</p
    corecore